Overview

Dataset statistics

Number of variables17
Number of observations492820
Missing cells1203113
Missing cells (%)14.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory63.9 MiB
Average record size in memory136.0 B

Variable types

Numeric8
Categorical8
Unsupported1

Alerts

Date has a high cardinality: 577 distinct values High cardinality
DayOfWeek is highly correlated with OpenHigh correlation
Open is highly correlated with DayOfWeekHigh correlation
DayOfWeek is highly correlated with OpenHigh correlation
Open is highly correlated with DayOfWeekHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
PromoInterval is highly correlated with Promo2High correlation
Promo2 is highly correlated with PromoIntervalHigh correlation
Assortment is highly correlated with StoreTypeHigh correlation
DayOfWeek is highly correlated with OpenHigh correlation
Open is highly correlated with DayOfWeekHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
Assortment is highly correlated with StoreTypeHigh correlation
CompetitionOpenSinceYear is highly correlated with Promo2SinceWeekHigh correlation
Promo2SinceWeek is highly correlated with CompetitionOpenSinceYear and 2 other fieldsHigh correlation
Promo2SinceYear is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
PromoInterval is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
DayOfWeek has 14754 (3.0%) missing values Missing
Open has 14860 (3.0%) missing values Missing
Promo has 14914 (3.0%) missing values Missing
StateHoliday has 14931 (3.0%) missing values Missing
SchoolHoliday has 14978 (3.0%) missing values Missing
StoreType has 12819 (2.6%) missing values Missing
Assortment has 12819 (2.6%) missing values Missing
CompetitionDistance has 14080 (2.9%) missing values Missing
CompetitionOpenSinceMonth has 165456 (33.6%) missing values Missing
CompetitionOpenSinceYear has 165456 (33.6%) missing values Missing
Promo2 has 12819 (2.6%) missing values Missing
Promo2SinceWeek has 248409 (50.4%) missing values Missing
Promo2SinceYear has 248409 (50.4%) missing values Missing
PromoInterval has 248409 (50.4%) missing values Missing
df_index is uniformly distributed Uniform
df_index has unique values Unique
StateHoliday is an unsupported type, check if it needs cleaning or further analysis Unsupported
Store has 12819 (2.6%) zeros Zeros

Reproduction

Analysis started2021-10-28 13:33:19.234123
Analysis finished2021-10-28 13:33:53.150181
Duration33.92 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct492820
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean307966.5224
Minimum1
Maximum616024
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum1
5-th percentile30726.95
Q1153875.75
median307825.5
Q3462052.25
95-th percentile585249.1
Maximum616024
Range616023
Interquartile range (IQR)308176.5

Descriptive statistics

Standard deviation177861.5477
Coefficient of variation (CV)0.5775353317
Kurtosis-1.19994135
Mean307966.5224
Median Absolute Deviation (MAD)154080.5
Skewness0.0009372779955
Sum1.517720616 × 1011
Variance3.163473015 × 1010
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2319581
 
< 0.1%
56031
 
< 0.1%
2839101
 
< 0.1%
1343061
 
< 0.1%
3227231
 
< 0.1%
3004871
 
< 0.1%
4680671
 
< 0.1%
4932341
 
< 0.1%
4696621
 
< 0.1%
2509021
 
< 0.1%
Other values (492810)492810
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
121
< 0.1%
131
< 0.1%
141
< 0.1%
ValueCountFrequency (%)
6160241
< 0.1%
6160231
< 0.1%
6160221
< 0.1%
6160211
< 0.1%
6160201
< 0.1%
6160191
< 0.1%
6160181
< 0.1%
6160161
< 0.1%
6160141
< 0.1%
6160131
< 0.1%

Date
Categorical

HIGH CARDINALITY

Distinct577
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 MiB
2013-04-23
 
913
2013-12-30
 
900
2013-08-31
 
900
2013-07-25
 
900
2013-11-08
 
900
Other values (572)
488307 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013-08-04
2nd row2013-09-22
3rd row2014-03-19
4th row2014-07-08
5th row2014-01-23

Common Values

ValueCountFrequency (%)
2013-04-23913
 
0.2%
2013-12-30900
 
0.2%
2013-08-31900
 
0.2%
2013-07-25900
 
0.2%
2013-11-08900
 
0.2%
2013-12-04899
 
0.2%
2013-09-27899
 
0.2%
2014-04-09896
 
0.2%
2014-06-11896
 
0.2%
2013-03-06895
 
0.2%
Other values (567)483822
98.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2013-04-23913
 
0.2%
2013-07-25900
 
0.2%
2013-11-08900
 
0.2%
2013-12-30900
 
0.2%
2013-08-31900
 
0.2%
2013-12-04899
 
0.2%
2013-09-27899
 
0.2%
2014-04-09896
 
0.2%
2014-06-11896
 
0.2%
2013-01-23895
 
0.2%
Other values (567)483822
98.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Store
Real number (ℝ≥0)

ZEROS

Distinct1116
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean543.4232661
Minimum0
Maximum1115
Zeros12819
Zeros (%)2.6%
Negative0
Negative (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum0
5-th percentile28
Q1257
median543
Q3829
95-th percentile1058
Maximum1115
Range1115
Interquartile range (IQR)572

Descriptive statistics

Standard deviation329.8690867
Coefficient of variation (CV)0.6070205441
Kurtosis-1.207634784
Mean543.4232661
Median Absolute Deviation (MAD)286
Skewness0.006638946392
Sum267809854
Variance108813.6143
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
012819
 
2.6%
379469
 
0.1%
195466
 
0.1%
465464
 
0.1%
939461
 
0.1%
553460
 
0.1%
74460
 
0.1%
484460
 
0.1%
360460
 
0.1%
752460
 
0.1%
Other values (1106)475841
96.6%
ValueCountFrequency (%)
012819
2.6%
1430
 
0.1%
2437
 
0.1%
3428
 
0.1%
4436
 
0.1%
5455
 
0.1%
6435
 
0.1%
7447
 
0.1%
8441
 
0.1%
9435
 
0.1%
ValueCountFrequency (%)
1115436
0.1%
1114453
0.1%
1113427
0.1%
1112435
0.1%
1111431
0.1%
1110439
0.1%
1109401
0.1%
1108430
0.1%
1107423
0.1%
1106418
0.1%

DayOfWeek
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing14754
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean3.983922722
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.993413915
Coefficient of variation (CV)0.500364604
Kurtosis-1.244316177
Mean3.983922722
Median Absolute Deviation (MAD)2
Skewness0.009882459474
Sum1904578
Variance3.973699037
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
269127
14.0%
368925
14.0%
468831
14.0%
668286
13.9%
168260
13.9%
568105
13.8%
766532
13.5%
(Missing)14754
 
3.0%
ValueCountFrequency (%)
168260
13.9%
269127
14.0%
368925
14.0%
468831
14.0%
568105
13.8%
668286
13.9%
766532
13.5%
ValueCountFrequency (%)
766532
13.5%
668286
13.9%
568105
13.8%
468831
14.0%
368925
14.0%
269127
14.0%
168260
13.9%

Open
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing14860
Missing (%)3.0%
Memory size3.8 MiB
1.0
398098 
0.0
79862 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0398098
80.8%
0.079862
 
16.2%
(Missing)14860
 
3.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1.0398098
83.3%
0.079862
 
16.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing14914
Missing (%)3.0%
Memory size3.8 MiB
0.0
300195 
1.0
177711 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row1.0
4th row0.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.0300195
60.9%
1.0177711
36.1%
(Missing)14914
 
3.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0.0300195
62.8%
1.0177711
37.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StateHoliday
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing14931
Missing (%)3.0%
Memory size3.8 MiB

SchoolHoliday
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing14978
Missing (%)3.0%
Memory size3.8 MiB
0.0
394824 
1.0
83018 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0394824
80.1%
1.083018
 
16.8%
(Missing)14978
 
3.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0.0394824
82.6%
1.083018
 
17.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StoreType
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing12819
Missing (%)2.6%
Memory size3.8 MiB
a
259575 
d
149310 
c
63771 
b
 
7345

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowd
2nd rowc
3rd rowd
4th rowd
5th rowc

Common Values

ValueCountFrequency (%)
a259575
52.7%
d149310
30.3%
c63771
 
12.9%
b7345
 
1.5%
(Missing)12819
 
2.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
a259575
54.1%
d149310
31.1%
c63771
 
13.3%
b7345
 
1.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Assortment
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing12819
Missing (%)2.6%
Memory size3.8 MiB
a
255121 
c
220989 
b
 
3891

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd rowa
3rd rowa
4th rowc
5th rowc

Common Values

ValueCountFrequency (%)
a255121
51.8%
c220989
44.8%
b3891
 
0.8%
(Missing)12819
 
2.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
a255121
53.2%
c220989
46.0%
b3891
 
0.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CompetitionDistance
Real number (ℝ≥0)

MISSING

Distinct654
Distinct (%)0.1%
Missing14080
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean5411.996888
Minimum20
Maximum75860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum20
5-th percentile140
Q1710
median2320
Q36880
95-th percentile20260
Maximum75860
Range75840
Interquartile range (IQR)6170

Descriptive statistics

Standard deviation7676.971802
Coefficient of variation (CV)1.418510018
Kurtosis13.02078342
Mean5411.996888
Median Absolute Deviation (MAD)1970
Skewness2.926860602
Sum2590939390
Variance58935896.06
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2505197
 
1.1%
12003705
 
0.8%
503469
 
0.7%
3503442
 
0.7%
1903433
 
0.7%
903056
 
0.6%
1803011
 
0.6%
3303001
 
0.6%
1502986
 
0.6%
1102625
 
0.5%
Other values (644)444815
90.3%
(Missing)14080
 
2.9%
ValueCountFrequency (%)
20426
 
0.1%
301720
0.3%
402164
0.4%
503469
0.7%
601289
 
0.3%
702147
0.4%
801289
 
0.3%
903056
0.6%
1002142
0.4%
1102625
0.5%
ValueCountFrequency (%)
75860431
0.1%
58260447
0.1%
48330445
0.1%
46590431
0.1%
45740437
0.1%
44320433
0.1%
40860430
0.1%
40540428
0.1%
38710427
0.1%
38630438
0.1%

CompetitionOpenSinceMonth
Real number (ℝ≥0)

MISSING

Distinct12
Distinct (%)< 0.1%
Missing165456
Missing (%)33.6%
Infinite0
Infinite (%)0.0%
Mean7.221157488
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum1
5-th percentile2
Q14
median8
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.211335959
Coefficient of variation (CV)0.444712079
Kurtosis-1.244578834
Mean7.221157488
Median Absolute Deviation (MAD)3
Skewness-0.1687786788
Sum2363947
Variance10.31267864
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
953861
 
10.9%
440751
 
8.3%
1139599
 
8.0%
330123
 
6.1%
728738
 
5.8%
1227475
 
5.6%
1026173
 
5.3%
621540
 
4.4%
518738
 
3.8%
217689
 
3.6%
Other values (2)22677
 
4.6%
(Missing)165456
33.6%
ValueCountFrequency (%)
16012
 
1.2%
217689
 
3.6%
330123
6.1%
440751
8.3%
518738
 
3.8%
621540
 
4.4%
728738
5.8%
816665
 
3.4%
953861
10.9%
1026173
5.3%
ValueCountFrequency (%)
1227475
5.6%
1139599
8.0%
1026173
5.3%
953861
10.9%
816665
 
3.4%
728738
5.8%
621540
 
4.4%
518738
 
3.8%
440751
8.3%
330123
6.1%

CompetitionOpenSinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct23
Distinct (%)< 0.1%
Missing165456
Missing (%)33.6%
Infinite0
Infinite (%)0.0%
Mean2008.687372
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum1900
5-th percentile2001
Q12006
median2010
Q32013
95-th percentile2015
Maximum2015
Range115
Interquartile range (IQR)7

Descriptive statistics

Standard deviation6.075623392
Coefficient of variation (CV)0.003024673463
Kurtosis124.6947003
Mean2008.687372
Median Absolute Deviation (MAD)3
Skewness-7.755642173
Sum657571933
Variance36.9131996
MonotonicityNot monotonic
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
201335758
 
7.3%
201235264
 
7.2%
201430115
 
6.1%
200526516
 
5.4%
201023751
 
4.8%
201123372
 
4.7%
200923251
 
4.7%
200823165
 
4.7%
200720638
 
4.2%
200620226
 
4.1%
Other values (13)65308
 
13.3%
(Missing)165456
33.6%
ValueCountFrequency (%)
1900388
 
0.1%
1961442
 
0.1%
19902167
 
0.4%
1994873
 
0.2%
1995852
 
0.2%
1998430
 
0.1%
19993434
 
0.7%
20004313
 
0.9%
20016833
1.4%
200211645
2.4%
ValueCountFrequency (%)
201516406
3.3%
201430115
6.1%
201335758
7.3%
201235264
7.2%
201123372
4.7%
201023751
4.8%
200923251
4.7%
200823165
4.7%
200720638
4.2%
200620226
4.1%

Promo2
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing12819
Missing (%)2.6%
Memory size3.8 MiB
1.0
244411 
0.0
235590 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row0.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.0244411
49.6%
0.0235590
47.8%
(Missing)12819
 
2.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1.0244411
50.9%
0.0235590
49.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo2SinceWeek
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct24
Distinct (%)< 0.1%
Missing248409
Missing (%)50.4%
Infinite0
Infinite (%)0.0%
Mean23.51080352
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q113
median22
Q337
95-th percentile45
Maximum50
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.11951074
Coefficient of variation (CV)0.6005541549
Kurtosis-1.382741288
Mean23.51080352
Median Absolute Deviation (MAD)13
Skewness0.08074705676
Sum5746299
Variance199.3605835
MonotonicityNot monotonic
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1434672
 
7.0%
4032233
 
6.5%
3118974
 
3.9%
1018138
 
3.7%
516880
 
3.4%
115160
 
3.1%
3715107
 
3.1%
1314507
 
2.9%
4514377
 
2.9%
2214102
 
2.9%
Other values (14)50261
 
10.2%
(Missing)248409
50.4%
ValueCountFrequency (%)
115160
3.1%
516880
3.4%
6420
 
0.1%
95923
 
1.2%
1018138
3.7%
1314507
2.9%
1434672
7.0%
1812550
 
2.5%
2214102
2.9%
232108
 
0.4%
ValueCountFrequency (%)
50434
 
0.1%
49410
 
0.1%
483894
 
0.8%
4514377
2.9%
441268
 
0.3%
4032233
6.5%
392484
 
0.5%
3715107
3.1%
364298
 
0.9%
3510829
 
2.2%

Promo2SinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing248409
Missing (%)50.4%
Infinite0
Infinite (%)0.0%
Mean2011.761476
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB

Quantile statistics

Minimum2009
5-th percentile2009
Q12011
median2012
Q32013
95-th percentile2014
Maximum2015
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.669396522
Coefficient of variation (CV)0.0008298183172
Kurtosis-1.056226033
Mean2011.761476
Median Absolute Deviation (MAD)1
Skewness-0.120639579
Sum491696634
Variance2.786884749
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
201154789
 
11.1%
201351889
 
10.5%
201440022
 
8.1%
201234913
 
7.1%
200931156
 
6.3%
201027380
 
5.6%
20154262
 
0.9%
(Missing)248409
50.4%
ValueCountFrequency (%)
200931156
6.3%
201027380
5.6%
201154789
11.1%
201234913
7.1%
201351889
10.5%
201440022
8.1%
20154262
 
0.9%
ValueCountFrequency (%)
20154262
 
0.9%
201440022
8.1%
201351889
10.5%
201234913
7.1%
201154789
11.1%
201027380
5.6%
200931156
6.3%

PromoInterval
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing248409
Missing (%)50.4%
Memory size3.8 MiB
Jan,Apr,Jul,Oct
142670 
Feb,May,Aug,Nov
56032 
Mar,Jun,Sept,Dec
45709 

Length

Max length16
Median length15
Mean length15.18701695
Min length15

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJan,Apr,Jul,Oct
2nd rowJan,Apr,Jul,Oct
3rd rowFeb,May,Aug,Nov
4th rowJan,Apr,Jul,Oct
5th rowFeb,May,Aug,Nov

Common Values

ValueCountFrequency (%)
Jan,Apr,Jul,Oct142670
28.9%
Feb,May,Aug,Nov56032
 
11.4%
Mar,Jun,Sept,Dec45709
 
9.3%
(Missing)248409
50.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
jan,apr,jul,oct142670
58.4%
feb,may,aug,nov56032
 
22.9%
mar,jun,sept,dec45709
 
18.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexDateStoreDayOfWeekOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
02319582013-08-042757.00.00.000.0da300.05.02014.01.040.02014.0Jan,Apr,Jul,Oct
12843752013-09-222877.00.00.000.0ca2740.05.02009.01.040.02014.0Jan,Apr,Jul,Oct
24765722014-03-199223.01.01.00.00.0da2110.03.02006.00.0NaNNaNNaN
35943752014-07-084062.01.00.000.0dc8240.03.02001.01.010.02013.0Feb,May,Aug,Nov
44173892014-01-238604.01.01.000.0cc5980.02.02010.00.0NaNNaNNaN
51435222013-05-141092.01.01.000.0ac3300.011.02010.00.0NaNNaNNaN
6785592013-03-146974.01.00.000.0da3780.0NaNNaN1.040.02011.0Jan,Apr,Jul,Oct
73621972013-12-03572.01.01.000.0dc420.06.02014.00.0NaNNaNNaN
83860792013-12-251733.00.00.0c1.0aa350.012.02012.00.0NaNNaNNaN
94590602014-03-03841.01.01.00.00.0ac11810.08.02014.00.0NaNNaNNaN

Last rows

df_indexDateStoreDayOfWeekOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
4928101752032013-06-1210393.01.00.000.0ac70.06.01990.01.022.02012.0Mar,Jun,Sept,Dec
492811874982013-03-235366.01.00.000.0ac4700.09.02002.01.031.02013.0Feb,May,Aug,Nov
4928125214302014-04-305233.01.01.000.0cc50.011.02013.00.0NaNNaNNaN
4928131373372013-05-089533.01.00.000.0aa19830.04.02006.01.022.02011.0Mar,Jun,Sept,Dec
492814548862013-02-207793.01.01.000.0aa16990.04.02004.00.0NaNNaNNaN
4928151102682013-04-132426.01.00.000.0da6880.09.02001.01.014.02011.0Jan,Apr,Jul,Oct
4928162591782013-08-29979NaN1.01.000.0ac2270.011.02005.01.014.02011.0Jan,Apr,Jul,Oct
4928173658382013-12-069805.01.01.000.0aa4420.09.02005.00.0NaNNaNNaN
4928181319322013-05-0305.01.01.000.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
4928191219582013-04-2480NaN1.01.000.0da7910.0NaNNaN0.0NaNNaNNaN